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CN ■ Abstract 

Qh< An unknown mhy n matrix Xq is to be estimated from noisy measurements 

■^ I Y = Xq + Z, where the noise matrix Z has i.i.d Gaussian entries. A popular ma- 

l/^ • trix denoising scheme solves the nuclear norm penalization problem minx||^ — 

X|||,/2 + A||X||*, where \\X\\^ denotes the nuclear norm (sum of singular values). 

This is the analog, for matrices, of ii penalization in the vector case. It has been 
f-H ■ empirically observed that, if Xq has lo^v rank, it may be recovered quite accurately 

^ . from the noisy measurement Y. 

^ I In a proportional growth framework w^here the rank r„, number of rows ?n.„ 

d ' and number of columns n all tend to oo proportionally to each other {vn/mn — ?> p, 

nin/n — )■ /3), ^ve evaluate the asymptotic minimax MSE 



M{p,l3)= lim inf sup MSE{X,Xx) 

m„,n^oo A rank{X)<rr, 



> 

00 ' Our formulas involve incomplete moments of the quarter- and semi-circle laws 

O , (/3 = 1^ square case) and the Marcenko-Pastur law {(3 < 1, non square case). We 

also show that any least-favorable matrix Xq has norm "at infinity". 

The nuclear norm penalization problem is solved by applying soft thresholding 
^ ■ to the singular values of Y. We also derive the minimax threshold, namely the 

value X*{p) which is the optimal place to threshold the singular values. 

All these results are obtained for general (non square, non symmetric) real ma- 
'^> ■ trices. Comparable results are obtained for square symmetric nonnegative- definite 

L_i ! matrices. 
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1 Introduction 

Suppose we observe a single noisy matrix Y, generated by adding noise Z to an un- 
known matrix Xq, so that Y = Xq + Z, where Z is a noise matrix. We wish to recover 
the matrix Xq with some bound on the mean squared error (MSE). This is hopeless in 
case Xq is a completely general matrix and the noise Z is arbitrary; but in case Xq hap- 
pens to be of relatively low rank and the noise matrix is i.i.d standard Gaussian, one 
can indeed guarantee quantitatively accurate recovery. This paper provides explicit 
formulas for the best possible guarantees obtainable by a popular, computationally 
practical procedure. 

Specifically, let Y,Xo and Z be m-hy-n matrices and suppose that Z has i.i.d entries, 
Zij ~ A/'(0, 1). Consider the following Nuclear-Norm Penalization (NNP) problem: 

[NNP) X, = argmin^^^,^^ J ||F - X||^ + A ||X|L , (1) 

where 1 1^| |^ denotes the sum of singular values of X G Mmxn, also known as the nu- 
clear norm, and A > is a penalty factor. A solution to (NNP) is efficiently computable 
by modem convex optimization software [ij]; it shrinks away from Y in the direction 
of smaller nuclear norm. 

Measure performance (risk) by mean-squared error (MSE). When the unknown Xq 
is of known rank r and belongs to a matrix class X^ „ C Mmxn, the minimax MSE of 
NNP is 



Mm,n{r\X)=mi sup E^o 

^ xoex„,„ fnn 

rank{Xo)<r 



Xx {Xo + Z)-X, 



(2) 



namely the worst-case risk of X^,, where A* is the threshold for which this worst-case 
risk is the smallest possible. For square matrices, m = n, we write M.n{r[K) instead of 
Mn,n{r\X)- Iri a very clear sense M.m,n{r\^) gives the best possible guarantee for the 
MSE of NNP, based solely on the rank and problem size, and not on other properties 
of the matrix Xq. 

1.1 Minimax MSE Evaluation 

In this paper, we calculate the minimax MSE M.rn,n{f[X) for two matrix classes X: 

1. General Matrices: X = Matm,n'- The signal Xq is a real matrix Xq g Mmxn ("^ < n). 

2. Symmetric Matrices: X = Syrrin' The signal Xq is a real, symmetric positive 
semidefinite matrix Xq E S^ C Mnxn- 

In both cases, the asymptotic MSE (AMSE) in the "large n" asymptotic setting ad- 
mits considerably simpler and more accessible formulas than the minimax MSE for 
finite n. So in addition to the finite-n minimax MSE, we study the asymptotic setting 
where a sequence of problem size triples (r„, m„, n) is indexed by n — )■ oo, and where, 
along this sequence m/n — )• /3 G (0, 1) and r/m — > p G (0, 1). We think of (3 as the matrix 
shape parameter; /3 = 1 corresponds to a square matrix, and /3 < 1 to a matrix wider 
than it is tall. We think of p as the fractional rank parameter, with p ^ implying low 
rank relative to matrix size. Using these notions we can define the asymptotic minimax 
MSE (AMSE) 

Al(p,/3|X)= lim A^^„,„(r„|X) . 

n— >oo 

We obtain explicit formulas for the asymptotic minimax MSE in terms of incomplete 
moments of classical probability distributions: the quarter-circle and semi-circle laws 
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(square case /3 = 1) and the Marcenko-Pastur distribution (non-square case /3 < 1). 
Figures 1 and 2 show how the AMSE depends on the matrix class X, the rank fraction 
p, and the shape factor /3. We also give explicit formulas for the optimal regularization 
parameter A*, also as a function of p; see Figures 3 and 4. 

These minimax MSE results constitute best possible guarantees, in the sense that for 
the procedure in question, the MSE is actually attained at some rank r matrix, so that 
no better guarantee is possible for the given tuning parameter A^,; but also, no other 
tuning parameter offers a better such guarantee. 

1.2 Motivations 

We see four reasons to develop these bounds. 

1.2.1 Applications 

Several important problems in modern signal and image processing, in network data 
analysis, and in computational biology can be cast as recovery of low rank matrices 
from noisy data, and nuclear norm minimization has become a popular strategy in 
many cases; see for example |l2l yj] and references therein. Our results provide sharp 
limits on what such procedures can hope to achieve, and validate rigorously the idea 
that low rank alone is enough to provide some level of performance guarantee; in fact, 
they precisely quantify the best possible guarantee. 

1.2.2 Limits on Possible Improvements 

One might wonder whether some other procedure offers even better guarantees than 
NNP Consider then the minimax risk over all procedures 



-^m,„(^|X) = inf sup Exo 

X xoexm,„ Tnn 

rank{XQ)<r 



X{Xo + Z)-X, 



2 



(3) 



here X = X{Y) is some measurable function of the observations. Here one wants 
to find the best possible procedure, without regard to efficient computation. We also 
prove a lower bound on the minimax MSE over all procedures, and provide an asymp- 
totic evaluation 

M*ip,/3\X.)>M^ip,(3\X) = p + /3p-(3p\ 

In the square case (/3 = 1), this simplifies to A^*(p|X) > M^{p\X.) = p(2-p). The NNP- 
minimax MSE is by definition larger than the minimax MSE: M{p, I3\X.) > M*{p, /3|X). 
While there may be procedures outperforming NNP, the performance improvement 
turns out to be limited. Indeed, our formulas show that 

M{p,P\X) ^ /^ , VP\ 



while 

For square matrices (/3 = 1), this simplifies to 

M{p\Xl M{p\iq 
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In words, the potential improvement in minimax AMSE of any other matrix denoising 
procedure over NNP is at most a factor of 3; and if any such improvement were avail- 
able, it would only be available in extreme low rank situations. Actually obtaining 
such an improvement in performance guarantees is an interesting research challenge. 

1.2.3 Parallels in Minimax Decision Theory 

The low-rank matrix denoising problem stands in a line of now-classical problems in 
minimax decision theory. Consider the sparse vector denoising problem, where an 
unknown vector x of interest yields noisy observations y = x + z with noise z ~i,i,d 
iV(0, 1); the vector x is sparsely nonzero - #{z : x{i) ^ 0} < e ■ n - with z and x 
independent. In words, a vector with a fraction < e of non zeros is observed with 
noise. In this setting, consider the following ^i-norm penalization problem: 

(Pi) xa = argmin^,gj^„i||y-x||2 + A||x||^, (6) 

The sparse vector denoising problem exhibits several striking structural resem- 
blances to low-rank matrix denoising: 

• Thresholding Representation. For scalar y define the soft thresholding nonlinearity 
by 

rix{y) =siga{y)-{\y\-X)+. 

In words, values larger than A are shifted towards zero by A while those smaller 
than A are set to zero. The solution vector xx of (Pi) obeys xa = ivxiyi))i> namely, 
it applies ?7a coordinate-wise. Similarly, the solution of (NNP), written in the SVD 
basis for Y, obeys X = diag{r]x{diag{Y))); it applies rjx coordinate wise to the 
singular values of the noisy matrix Y. 

Remark: by this observation, (Pi) can also be called "soft thresholding" or "soft 
threshold denoising", and in fact these other terms are the labels in common use. 
Similarly, NNP amounts to "soft thresholding of singular values", this paper will 
henceforth use the term Singular Value Soft Thresholding (SVST). 

• Sparsity/Loiu Rank Analogy. The objects to be recovered in the sparse vector de- 
noising problem have sparse entries; those to be recovered in the low rank matrix 
denoising problem have sparse singular values. Thus the fractional sparsity pa- 
rameter e is analogous to the fractional rank parameter p. It is natural to ask the 
same questions about behavior of minimax MSB in one setting (say, asymptotics 
as p — )■ 0) as in the other setting {e — )■ 0). In fact, such comparisons turn out to be 
illuminating. 

• Structure of the Least-Favorable Estimand. Among sparse vectors a; of a given fixed 
sparsity fraction e, which of those is the hardest to estimate? This should max- 
imize the mean-squared error of soft thresholding, even under the most clever 
choice of A. This least-favorable configuration is singled out in the minimax 
AMSE 

Mnie)=mi sup ^E\\xx-x\\l. (7) 

^ #{i:x{i)^0<e-n} 

In this min/max, the least favorable situation has all its non zeros in some sense 
"at infinity"; i.e. all sparse vectors which place large enough values on the nonze- 
ros are nearly least favorable, i.e. essentially make the problem maximally diffi- 
cult for the estimator, even when it is optimally tuned. In complete analogy, in 
low-rank matrix denoising we will see that all low rank matrices which are in an 
appropriate sense "sufficiently large", are thereby almost least favorable. 



1 Introduction 



• Structure of the Minimax Smoothing Parameter. In the sparse vector denoising AMSE 
(O the \ = \{s) achieving the infimum is a type of optimal regularization param- 
eter, or optimal threshold. It decreases as e increases, with X{e) -^ as £ — )• 1. 
Paralleling this, we show that the low-rank matrix denoising AMSE ^ has min- 
imax singular value soft threshold A* (p) decreasing as p increases, and A* (p) — t- 
as p — )■ 1. 

Despite these similarities, there is one major difference between sparse vector de- 
noising and low-rank matrix denoising. In the sparse vector denoising problem, the 
soft-thresholding minimax MSE was compared to the minimax MSE over all proce- 
dures by Donoho and Johnstone 1^. Let M{e) = lim„_5.oo Mn{e) denote the soft thresh- 
olding AMSE and define the minimax AMSE over all procedures via 

M*{e)= lim inf sup -E||x — x||2, 

where here x = x{y) denotes any procedure which is measurable in the observations. 
In the limit of extreme sparsity. Soft Thresholding is asymptotically minimax 

M(e) 



M*{e) 



Breaking the chain of similarities, we are not able to show a similar asymptotic mini- 
maxity for SVST in the low rank matrix denoising problem. Although eq. (ID says that 
soft thresholding of singular values is asymptotically not more than a factor of 3 sub- 
optimal, we doubt that anything better than a factor of 3 can be true; specifically, we 
conjecture that SVST suffers a minimaxity gap. For example, for (3 = 1, we conjecture 
that 

,, J, ^ 3 as p ^ . 

We believe that interesting new estimators will be found improving upon singular 
value soft thresholding by essentially this factor of 3. Namely: there may be substan- 
tially better guarantees to be had under extreme sparsity, than those which can be of- 
fered by SVST. Settling the minimaxity gap for SVST seems a challenging new research 
question. 

1.2.4 Indirect Observations. 

Evaluating the Minimax MSE of SVST has an intriguing new motivation, j^^^ arising 
from the newly evolving fields of compressed sensing and matrix completion. 

Consider the problem of recovering an unknown matrix Xq from noiseless, indirect 
measurements. Let A : R™^" — ;• M^ be a linear operator, and consider observations 

y = A{Xo) . 

In words, y ^ W contains p linear measurements of the matrix object Xq. Can we 
recover XqI It may seem that p > mn measurements are required, and in general this 
would be true; but if Xq happens to be of low rank, and A has suitable properties, we 
may need substantially fewer measurements. 

Consider reconstruction by Nuclear Norm Minimization: 

(Pnuc) iTiin ||X||* subject toy = A{X) . (8) 

Recht and co-authors found that when the matrix representing the operator A has 
i.i.d Af{0, 1) entries, and the matrix is of rank r, the matrix Xq is recoverable from 
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p < nm measurements for certain combinations of p and r 180. The operator A offers 
so-called Gaussian measurements when the representation of the operator as a matrix 
has i.i.d. Gaussian entries. Empirical work by by Recht, Xu and Hassibi istllOO, Fazel, 



Parillo and Recht |180, Tanner and Wei |lll] and Oymak and Hassibi |12|1 , documented 
for Gaussian measurements a phase transition phenomenon, i.e. a fairly sharp transition 
from success to failure as r increases, for a given p. Putting p = r/m and S = p/{mn) 
it appears that there is a critical sampling rate 5*{p) = S*{p; (3), such that, for 5 > 5*{p), 
NNM is successful for large m, ra, while for 5 < S*{p), NNM fails. 6*{p) provides a 
sharp "sampling limit" for low rank matrices, i.e. a clear statement of how many mea- 
surements are needed to recover a low rank matrix, by a popular and computationally 
tractable algorithm. 

In very recent work, ||5|-l7t], have shown empirically that the precise location of the 
phase transition coincides with the minimax MSE: 

6*{p-f3)=M{p,f3\X), pG(0,l), /3g(0,1); (9) 

A key requirement for discovering and verifying ^ empirically was to obtain an ex- 
plicit formula for the right-hand side; that explicit formula is derived and proven in 
this paper. Relationship Q connects two seemingly unrelated problems: Matrix de- 
noising from direct observations and Matrix recovery from incomplete measurements. 
Both problems are attracting a large and growing research literature. Equation ^ 
demonstrates the importance of minimax MSE calculations even in a seemingly un- 
related setting where there is no noise and no statistical decision to be made! 
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2 Results 

2.1 Least-Favorable matrix. 

For SVST, any matrix of rank r is least-favorable among mxn matrices of rank at most 
r, in the limit | |Xo| | — ?■ oo. 

Theorem 1. The worst-case matrix for SVST has its principal subspace "at oo". Let 

X>0,m<neN and I < r < m. For the worst-case risk ofXxonmxn matrices of rank at 
most r, we have 

sup R{Xx,Xo) = lim R{Xx , /iC) , (10) 

rank{Xo)<r 

where C G Mmxn is any fixed matrix of rank exactly r. 

1.1 Minimax MSE. 

Let Wi{m, n) denote the marginal distribution of the i-th largest eigenvalue of a stan- 
dard central Wishart matrix Wm{I, n), namely, the z-th largest eigenvalue of the random 
matrix -ZZ' where Z E Mmxn has iid A/'(0, 1) entries. Define for A > and a E {1/2, 1} 

MJA;r,m,a) = \ h— ^A 

m n mn mn 

m—r 



-|- a > Wj (A; m — r; n — r1 , (11) 

mr} < ^ 



where 



oo 

Wi(A;m,n)= f {Vt - Af dWi{m,n){t) (12) 

is a combination of the complementary incomplete moments of standard central Wishart 
eigenvalues 

oo 

for A; = 0,1, 2. 

Theorem 2. An implicit formula for the finite-n minimax MSE. The minimax MSE of 
SVST over m-by-n matrices of rank at most r is given by 

Mn{r,m\Mat) = min M„, (A; r,m, 1) and 
Mn{r\Sym) = min M„ (A; r,n, 1/2) , 

where the minimum on the right hand sides is unique. 

In fact, we will see that M„(A; r, m, a) is convex in A. As the densities of the stan- 
dard central Wishart eigenvalues Wiijn.n) are known [13], this makes it possible, in 
principle, to tabulate the finite-n minimax risk. 



2 Results 



2.3 Minimax AMSE (Asymptotic MSE). 

A more accessible formula is obtained by calculating the large-ra asymptotic minimax 
MSE, where r = r{n) and m = m[n) both grow proportionally to n. Let us write AMSE 
for Asymptotic MSE. For the case X,„ .^ = Matm,n we assume a limiting rank fraction 
p = lim„_^oo r/m and limiting aspect ratio /3 = lim^^oo rn/n and consider 



M{p,f3\Mat) = lim Mn{r,m\Mat) = lim inf sup 



-E 



n—^oo X 



rank (Xq )<pl3n 



mn 



X\ — Xn 



(13) 



Similarly, for the case X^ „ = Synin, we assume a limiting rank fraction p = lim„^oo r/n 
and consider 



Ai{p\ Sym) = lim A^„(r|5'2/m) = lim inf sup — E 



n— >oo A 



Xnes, 



rank{Xo)<pn 



n^ 



X\ — Xq 



(14) 



The Marcenko-Pastur distribution |14|1 gives the asymptotic empirical distribution 
of Wishart eigenvalues. It has density 



Plif) = ^^V^(^+ - ^)(^ - 7-) ■l[7-,7+]W' 



(15) 



where 7± = (l ± ^7) . Define the complementary incomplete moments of the Marcenko- 
Pastur distribution 



7+ 



P^{x;k) = / fp^{t)dt. 



(16) 



Finally, let 

M(A;p,p,a) = p + p - pp + {I - p) 



pA^ + 



+ a(l-p)(P^(A2;l)-2AP^(A2-i' 



IV- ' 2' 



with 7 = 7(p, p) = {p- pp)/{p - pp). 



A2p^(A2;0) 



(17) 



Theorem 3. An explicit formula for the minimax Asymptotic MSE. For the minimax 
AMSE of SVST we have 



M{p,f3\Mat) 
M{p\Sym) 



min M(A;p,/3p,l) 

0<A<7+ 

min M(A;p,p,l/2) 

0<A<7+ 



(18) 
(19) 



with 7+ = (1 + a/(/3 — /3p)/(l — /3p) ) / where the minimum on the right hand sides is 

unique. Moreover, for any < /3 < 1, the function p i-)- M{p,l3\Mat) is continuous and 
increasing on p & [0, 1], with A^(0, l3\Mat) = and M{1, l3\Mat) = 1. The same is true for 
M{p\Sym). 

The curves p h- ;■ A^(p, (3\Mat), for different values of (3, are shown in Figure [1] The 
curves p t-^ M{p, l3\Mat) and p t-)- A^(p, l3\Mat) are shown in Figure |2l 
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AMSE 




Figure 1: The minimax AMSE curves for case Mat, defined in (|18|) , for a few values of 



AMSE: case X^Mat vs. case X=Sym 




Figure 2: The minimax AMSE curves for case Mat with (3 = 1 and case Sym 



2 Results 11 

2.4 Computing the minimax AMSE. 

To compute Ai{p, (3\Mat) and Ai{p\Sym) we need to minimize ((T7|). Define 

A*(p, /3, «) = argmin^ M(A; p, p, a) . (20) 

Theorem 4. A characterization of the minimax AMSE for general (3. For any a G 

{1/2, 1} and /3 G (0, 1], the function p h^ A*(p, /3, a) is decreasing on p e [0, 1] with 

limA,(p,/3,a) = A,(0, /3, a) = 1 + v^ ant^ (21) 

limA,(p,/3,a) = A,(l, /3, a) = . (22) 

For p G (0, 1), the minimizer A*(p, /3, a) is the unique root of the equation in A 

P,(A2; i) - a • P,(A2; 0) = -^^ , (23) 



where the left hand side of (|23|) zs a decreasing function of A. 

The minimizer A^. (p, /3, a) can therefore be determined numerically by binary search. 
(In fact, we will see that A* is the unique minimizer of the convex function A t—^ 

M(A;p, p, a).) Evaluating M.{p, l3\Mat) and M.{p\Sym) to precision e thus requires 



log(l/e) evaluations of the complementary incomplete Marcenko-Pastur moments (|T6|) . 
For square matrices (/3 = 1), this computation turns out to be even simpler, and 
only requires evaluation of elementary trigonometric functions. 

Theorem 5. A characterization of the minimax AMSE for /3 = 1. We have 

M(A;p,p,a) = p(2 - p) 



;i - p) [pA^ + a(l - p) {Q, (A) - 2\Q, (A) + A'Qo (A)) ] , (24) 



where 



Qo(x) = -[,/A~^dt = l-—VA^^--atan{ ^ '^ ) (25) 

n J 2n vr v4 — a;2 

X 

2 

Qi{x) = - I tVA^^dt = ^{A-x^f/^ (26) 

TC J StT 

2 

Q2{x) = - ft^VI^^dt = l-^xy/I^^{x^-2)--asini-) (27) 



x 



are the complementary incomplete moments of the Quarter Circle law. Moreover, for a G 

{1/2,1} 

A,(p,p,a) = 2-sin(^,(p)) , (28) 

where Oa{p) G [0, 7r/2] zs the unique solution to the transcendental equation 

B^cot(O).{l-lcos^9))^^-^0^. (29) 

The left hand side of (|29|) zs a decreasing function of 6. 

In 115(1 we make available a Matlab script, and a web-based calculator for evaluating 
M{p, /3|Mat) and M{p\Sym). The implementation provided employs binary search to 
solve (|23|) (or (|29|) for /3 = 1) and then feeds the minimizer A^, into (|T7)) (or into (|24)l for 

/3 = 1). 
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2.5 Asymptotically optimal tuning for the SVST threshold A. 

The crucial functional A*, defined in (|20|) , can now be explained as the optimal (mini- 
max) threshold of SVST in a special system of units. Let A^,(m, ra, r|X) denote the mini- 
max tuning threshold, namely 



A*(m,n,r|X) 



1 



-E, 



argmin^ sup iii^Xo 

A-oex„,„ ran 

rank{Xo)<r 



Xx {Xo + Z)-Xo 



Theorem 6. Asymptotic minimax tuning of SVST. Consider again a sequence n i-t- 
{m{n),r{n)) with a limiting rank fraction p = lim„_>.oo r/m and a limiting aspect ratio /3 = 
lim„_j.oo m/n. For the asymptotic minimax tuning threshold we have 

lim ^^ A* (m,n,r I Mat) = a/(1 — /3p) ■ A*(p, /3, 1) and 

n-s>oo yjn 

\mv —=K{n,r\Sym) = a/(1 - p) ■ A^(p, 1, 1/2) . 

n->oo y/n 

The curves p H- lim„_^oo K{'m,n,r\Mat)/ y/n, namely the scaled asymptotic mini- 
max tuning threshold for SVST, are shown in Figure |3] for different values of 13. The 
curves p i-)- lim„_5.oo A*(n, ra, r | Mat) /-y/ra and p i-)- lim„_j.oo \*{n,r\Sym) / y/n are shown in 
Figure HI 



2.6 Parametric representation of the minimax AMSE for square ma- 
trices. 

For square matrices (p = p, /3 = 1) the minimax curves M.{p, 1| Mat) and M.{p\Sym) 
admit a parametric representation in the (p, M.) plane using elementary trigonometric 
functions. 

Theorem 7. Parametric representation of the minimax AMSE curve for (3 = 1. As 6 

ranges over [0, 7r/2], 



P{0) 

M{e) 



1 - 



TT 



/2 



^+(cot(^)-(l-icOs2(^))) 



2p(^) - p\e) + 4p(^)(i - p{e))sin\e) 

4 



+ -(1-p) 

TT 



(vr - 2e){\ - cos{ef) + '-^{cos{2e) - 14) 



is a parametric representation of p i-s- M{p, p\Mat), and similarly 

e + (cot(e) • (1 - \cos^{e))) - 7r/2 



p{e) 
M{e) 



6 + (cot(e) • (1 - \cos^{e))) + 7r/2 

2p(^) - p\e) + 4p(^)(i - p{e))sin\e) 

2 



TT 



[i-pY 



in - 2e)il - '''^^^'^ + '-^icosi2e) - 14) 



zs a parametric representation of p l-^> M{p\Sym). 



2 Results 



13 



Minimax Tuning^ 




r< 



-p= 


=0.25 


-|3= 


=0.33 


-P= 


= 0.5 


-P= 


=0.67 


-P= 


=0.75 


- P= 


= 1 



Figure 3: (Nonsquare Cases.) The scaled asymptotic minimax tuning threshold for 
SVST, p I— )• lim„^oo A*(m, n, r\Mat)/ y/n, when m/n -^ (3 and r/m — )• p, for a few values 

of/3. 



Minimax Tuning?.: case X^Mat vs. case X^Sym 




Figure 4: (Square Case.) The scaled asymptotic minimax tuning threshold for SVST, 

p I-)- limn^.oo X*{n,n, r\M at) / ^/n scad p h- )■ lim^^oo K{n,r\Sym)/y/n, when r/m — t- p. 
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2.7 Minimax AMSE near p = 0. 

Theorem 8. Minimax AMSE to first order in p near p = 0. For the behavior of the 
minimax curves near p = Owe have 

M{p,(3\Mat) = 2(^1 + v^ + Z?) ■p + o(p) 

and in particular 

Mip,l\Mat) = Qp + o{p). 

Moreover, 

M.{p\Sym) = 6p + o{p). 

The minimax AMSE curves p ^—^ Ai{p, (3\Mat) for small values of p, and the corre- 
sponding approximation slopes 2(1 + y/]3 + (3) are shown in Figure |5] for several values 
of /3. We find it surprising that asymptotically,si/mmef rzc positive definite matrices are no 
easier to recover than general square matrices. This phenomenon is also seen in the case of 
sparse vector denoising, where in the limit of extreme sparsitv, the non negativity of 
the non zeros does not allow one to reduce the minimax MSE. 



! 



2.8 AMSE vs. the asymptotic global minimax MSE 

In ^ we have introduced global minimax MSE A^J^ „(r |X), namely the minimax risk 
over all measurable denoisers X : Mmxn — ^ Mmxn- To define the large-n asymptotic 



global minimax MSE analogous to (|T3|) , consider sequences where r = r{n) and m = 
m{n) both grow proportionally to n, such that both limits p = lim„_5.oo r/m and /3 = 
lim„_^oo rn/n exist. Define the asymptotic global minimax MSE 

M*{p,f3\X) = lim M*^^^{r,m\X) (30) 

Theorem 9. 1. For the global minimax MSE we have 



9 

ty ly lY' I lY' 



Ml,^Sr\X) > - + (31) 

m n mn 

for case Mat and, ifm = n,for case Sym. 

2. For the asymptotic global minimax MSE we have 

M*{p,f3\X)>p + p-pp (32) 

for case Mat and, if P = I, for case Sym. Here p = 13 p. 

3. Let 

M-{p,(5) = p + p-pp (33) 

denote our lower bound on asymptotic global minimax MSE. Then 



M-(p,l3) - V 1 + 3 
and 



lim-^'fW^2fl + -V^| (35) 



^Compare results in [4] with [lb]. To be clear, in both matrix denoising and vector denoising, there is 
an MSE advantage for each fixed positive rank fraction/sparsity fraction. It's just that the benefit goes 
away as either fraction tends to 0. 



2 Results 



15 





x10-^ 


AMSE 






6 




1 1 1 1 


1 y 

yi 
y^'' y-^ 


5 














y'^^y^ y^' 






>% ' 


ya'y^' ' y^' ' 


4 


— 


^^,' 


^y* y[' "" Z' 






-^' 


yyy y^'' 






y^ y^' 








yy' yy^ 


y^' ^yi 


LU 






y^' ^y"^' 


< 


- 










yyy^ ^^' 








yy yi^yy^ ^,-y^ 




2 




y' y^^ ^y^' ^^y^ 












-e-p=0.25 










P=0.33 


1 


y 






-e-p=0.5 




^%^ 






P=0.67 




J^^^ 






-e-p=0.75 


( 


^^^ 


1 1 1 1 




-e-p= 1 



10 



x10"" 



Figure 5: The minimax AMSE curves p i— ;■ A^(p, /3|Mat) for small values of p (dashed 
lines) and the corresponding approximation slopes 2(1 + a/^ + /3) (solid lines). 



2.9 Outline 

The body of the paper proves the above results. Section |3] introduces notation, and 
proves auxiliary lemmas. In Section |4] we characterize the worst-case MSE of SVST 
for matrices of fixed size. Section |5] derives formula ([TT]) for the worst-case MSE, and 
proves Theorem |2l In Section [6] we pass to the large-ri limit, deriving formula (|T7|) , 
which provides the worst-case asymptotic MSE in the large-n limit, and prove Theo- 
rem |3l In Section [7| we investigate the minimizer of the asymptotic worst-case MSE 
function, and its minimum, namely the minimax AMSE, and prove Theorems |4] and 
|51 We then connect the minimizer of the worst-case AMSE function to the minimax 
tuning threshold for SVST and prove Theorem [6l Finally, we derive a parametric rep- 
resentation of the minimax AMSE curve for square matrices (Theorem [T]). In Section 
|8]we calculate the first order approximation of the minimax AMSE around p = and 
prove Theorem [8l In Section |9] we extend the discussion scope from SVST denoisers to 
all denoisers, investigate the global minimax MSE, and prove Theorem |9l A derivation 
of the Stein Unbiased Risk Estimate for SVST, which is instrumental in the proof of 
Theorem [U, is discussed in Appendix lAl 
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3 Preliminaries 

3.1 Scaling 

Our main object of interest, the worst-case MSB of SVST, 



sup E 

rank{Xo)<pm 



X-Xn 



(36) 



is more conveniently expressed using a specially calibrated risk function. Since the 
SVST denoisers are scale-invariant - 



E 



X 



X -X 



X+aZ 



a'^Ex 






+z 



- we are free to introduce the scaling a = n ^/^ and define the risk function of a denoiser 

X : Mm.y:n "> M^xn at Xq G M^xn by 



/2(X,Xo):=^e||x(Xo + ^z)-Xo 



(37) 



Then, the worst-case MSB of X at Xq is given by 



sup E 

rank(Xo)<p'm 



X-Xa 



sup R{X,Xo) 

F ^oS-^-fmxn 

rank{Xo)<pm 



(38) 



To vary the SNR in the problem, it will be convenient to vary the norm of the signal 

T 111 I I 9 

matrix Xq instead, namely, to consider Y = fiXo + ^Z with ^ | |-^o| If — ^■ 



3.2 Notation 

• Throughout this text, Y will denote the data matrix Y = jiXq + -h^Z. 

• Mmxn and Om denote the set of real-valued m-hy-n matrices, and group of m-by- 
m orthogonal matrices, respectively. 

• ll'llp, denotes the Probenius matrix norm on Mmxn, namely the Buclidean norm 
of a matrix considered as a vector in M."^^. 

• We denote matrix multiplication by either AB or A- B. 

• We use the following convenient diag notation. Por a matrix X E Mmxn, we 
denote by X^ G M'" its main diagonal. 



{X/\)i — Xi^ 



1 < i < m . 



(39) 



Similarly, for a vector x G M"*, and n > m that we suppress in our notation, we 
denote by xa G Mmxn the "diagonal" matrix 



i^A)i,j 



Xi 1 < i = j < m 
otherwise 



(40) 
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• We denote a "fat" Singular Value Decomposition (SVD) oi X e Mmxn X = Ux ■ 
xa ■ Vx, with Ux e Mrnxm and Vx G M„xn- Note that the SVD is not uniquely 
determined, and in particular x can contain the singular values of X in any order. 
Unless otherwise noted, we will assume that the entries of x are non-negative and 
sorted in non-increasing order, xi > . . . > x^ > 0. When m < n, the last n — m 
columns of Vy are not uniquely determined; we will see that our various results 
do not depend on this choice, not matter in. Note that with the "fat" SVD, the 
matrices Y and Uy ■ Y ■ Vy have the same dimensionality, which simplifies the 
notation we will need. 

• When appropriate, we let univariate functions act on vectors entry- wise, namely, 

for X G M" and / : M — t- M, we write /(x) G M" for the vector with entries 

/(x), = f{Xi). 

3.3 Xx acts by soft thresholding of the data singular values 

By orthogonal invariance of the Frobenius norm, for a matrix X G M.mxn, 

\\Y - x\\l = \\y ~u'y ■ X -VYWlyWy - [U'y ■ X ■Vy)^.\\1 , 
with equality only if {Uy, Vy) diagonalizes X. Therefore, (|42l) is equivalent to 

1 2 

xa = argmm^gjj„-||y-x||2 + A||x||^ , (41) 



through the relation Xx{Y) = Uy ■ (xa)a ■ Vy. It is well known that the solution to (|4T|t 
is given by xa = Yx, where yx = {y — X)+ denotes coordinate-wise soft thresholding of 
y with threshold A. We conclude that the SVST estimator (|42l) is given by 

Xx-Y^UY-{yx)A-V^. (42) 

Note that (|42l) is well defined, that is, Xx{Y) does not depend on the particular SVD 

Y = UY-{y)A-V;- chosen. 



In case Sym, observe that the solution to (|42|) is constrained to lie in the linear 
subspace of symmetric matrices. The solution is the same whether the noise matrix 
Z G Mnxn has i.i.d standard normal entries, or whether Z is a symmetric Wigner ma- 
trix i(Zi + Z[) where Z G Mnxn has i.i.d standard normal entries. Below, we assume 
that the data in case Sym is of the form Xq + Z where Xq G S*" and Z has this Wigner 
form, namely, the singular values y are the absolute values of eigenvalues of the sym- 
metric matrix Xq + Z. 

3.4 The singular values of Y when | |Xo| | — > oo 

Our main results depend on the following crucial observations regarding the singular 
values oi Xq + Z when Xq is a rank r matrix and is rescaled so | |Xo| | — t- oo. 

Lemma 1. Let Y^ = fiXo + Z = U^- (y^) a ■ V"^/ where Xq G M^xn is of rank r and assume 
that y is sorted in non-increasing order, yi > ■ ■ ■ > ym- Then 

lim y^^i = cx), i = l,...,r. 
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Proof. For a symmetric matrix S, let \i{S) denote the i-th eigenvalue in nonincreasing 
order. The Courant-Fischer min-max characterization of eigenvalues states that 

XAS) = min max v'S'v . (43) 

codim{U)^i — l l|v|| — 1 

Of course the squared singular values of F^ obey yf = Aj(y'^F^). 

Choose 1 < i < r. Let U < M™ with codim(U) = i — 1. Since m — r + 1 < dim{U) < m 
necessarily U fl Im{Xo) ^ and we can choose v G f/ fl /m(Xo) with 1 1 v| | = 1. For this 
v, and with the yi again denoting the singular values of Y^, 

yl > v'i;r;v, l<i<r. (44) 

Without loss of generality, we may assume we are working in the basis of the SVD 
of Xq, so that Xq is vanishing except in the first r positions of the diagonal. We let Gr 
denote the r)<r sub-matrix of the upper left comer of XqXq, this is the 'principal piece' 
of XqX'q where the non zeros live; the rest of XqX'q is vanishing in this basis. Also let Z^ 
denote the upper left r by r comer of Z, and let X^ denote the r by r diagonal matrix 
whose diagonal entries are the ordered singular values. Let H denote the mxm matrix 
which is zero except in the upper left corner, where there is a nonzero r x r sub matrix 
Hr given by Hr = XrZr + Z'^X^. We also need Wr, the upper left r by r comer of the 
Wishart matrix ZZ'. Finally we note that when represented in this basis, v = (vr, Om-r), 
i.e. the only non zeros in v occur in the first r places, and we let Vr denote the r x 1 
column vector of those first r entries. 

For a symmetric matrix S, let XminiS) denote the smallest eigenvalue. Note that 
XminiZZ') > 0; that Xmin{Gr) > by our hypothesis that the rank of Xq is r, and finally 
note that XminiHr) may be positive or negative, but is in any case well-defineq3. 

Now Yf,Y^ = /i2(XoX^) + fiXoZ' + fxZX'o + ZZ'. Taking into account our choice of 
coordinates, the upper left r by r block of Y^jY^ is fi'^Gr + fJ-Hr + Wr- The vector v 'sees' 
only the first r coordinates and 

> /^ XminiGr) + fiXmin{Hr) + XminiHr)- 

This lower bound does not depend on the specific choice of U or of v. Because X^in (Gr) > 



0, this lower bound tends to oo as fi —^ oo; we finally invoke (|44|) . 



D 

Definition 1. For a pair of matrices Xq, Z E Mmxn, we denote by C,{XQ,Z\Mat) = 
(Ci, . • • , Cm-r) the singular values, in non-increasing order, of 

Hm ■ ^ ■ n'„ G M(m_r)x(n-r) , (45) 

where n„ : M'" -^ M'"^'' is the projection of M™ on null{Xl^) = Im{Xo)^ and n„ : 
M" — ;■ M""'' is the projection on null{Xo). Similarly, for a pair of matrices Xq, Z G Mnxn, 
denote by C(-^o, Z\Sym) = {d, . . . , Cm-r) the eigenvalues, in non-increasing order, of 

n„, . i(z + z') ■ K e M(„_,)x(n-r) • (46) 

Lemma 2. Let y^ = fiXo + Z = U^- (y^)A ■ V^, where Xq g Mmxn is of rank r, and write 
y^ = iy^,,l, • • • , yi,,m) with y^^i > ...> y^,m.. Also, define 

z, = (U'^.Z-V,)^ 



■^Moreover, for a fixed choice of Xq, one could control the distribution of Xmin{Hr) 



3 Preliminaries 19 



with z^ = (z^^i, . . . , z^^rn)- Finally, let (Ci, • • • , (m-r) = C(^o, Z\X) as in Definition [H 
where X = Mat or X = Sym. Then the m — r lower singular values of Y satisfy 

lim y^^r+i = \Ci\ (47) 



lim Zf,^r+i = Ci (48) 

fl—^OO 

lim x^,r+i = . (49) 

Proof. Assume Case Mat. Write Y^ = aY{l/a) = Xo + aZ and let Y^ = U„- {y^)A- K 
denote the SVD of Y. Note that Ui/„ = U„ and Vij^ = V„. Write 



U,= ul ■■■ K K= V 



with u^ G M'" and v^ G M", for the columns of Ucr and Va-. Similarly write u* (resp. vM 
for the columns of U^ (resp. V^). 

Let Ucr,m-r and Va-,n-r denote matrices consisting of the last m — r and n — r columns 
of Ua and V^, again in the ordering where the largest singular value has index i = 1. 
Similarly, let f/o.r and Vo,r denote the sub matrices of the first r columns. 

Because the r nonzero singular values of Xq are separated from 0, we can apply 



the eigenvector perturbation analysis in Theorem 8.1.7 of Golub and Van Loan |17|1 
(originally based on P. Stewart's work) to obtain the representation 

ya,n-r = (n),n-r + >H),r-Rcr)'Jo-5 

where P„, Q„, R„, S^ all depend on a, Xq, and Z, P„ G M^xm-r/ Qa G Mm-rxm-r, Ra G 
Mj-^ri-r, Scr G Mn-rxn-r- In short, the perturbed singular vectors are representable 
using the unperturbed ones. In this representation, the cited Theorem provides the 
following control of the coefficients: 

\\Pah = 0{(y), \\R,\\2 = 0(a), a -> 0; ||g<,||2 < 2, ||5<,||2 < 2, a < (Tq, 

where 1 1 ■ 1 12 is the operator norm on £2- Remarking that 

U'o^m-r^O = 0' and XoVo^n-r = 0, 

we have 
and so 

\\U^^^_j.XoVcr^n-r)i,i\ < \\PcrQaUo,r-^0^0,rRaSa\\2 

< ll^:i|2-||Q:,||2-||Xo||2-||i?.||2-||5^||2 

Recalling a = l//i, and uj^+* is the z-th column of Ua-,m-r with a = l/ji, analogously 

for v,''+% 



M 



lim Xf,^r+i = lim (<+')' ■ {fxXo) ■ v 



r+i 



lim /i ■ \{U^, j.XoVi/^n-r)i,i\ 

lim fi ■ 0{fi~'^) = 0, 

fl—^OO 
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for 1 < i < m — r. 

The projectors Um and n„ of Definitiori [1] obey Ug"''* = IlmUo^* and Vg"^* = n^Vo"^* 
for 2 = l,...m - r. As lim^^o ujr = lim<T^o n^Ua and lim^^ov^ = lim^^^on^v^, it 
follows that for I < i < m — r we have lim^_^oo l/^,r+j = ||n^i^Mn„ — AiW^, where ||-||2 
is the operator norm on £2 and Ai e M(m-r)x(n--r) is the best rank-? approximation of 
n^y^n„ = H'^ZHn in Frobemus norm, namely, lim^_^oo 2/^,r+i = Q- Finally, this implies 

The case Sym involves eigenvalues rather than singular values, making the proof 
simpler but otherwise identical. 

D 



4 The Least-Favorable Matrix for SVST is at | |X| | = oo 

We now prove Theorem [Tl which characterizes the worst-case MSE of the SVST de- 
noiser X\ for a given A. The theorem follows from a combination of two classical gems 
of the statistical literature. The first is Stein's Unbiased Risk Estimate (SURE) (|78|) from 
1981, which we specialize to the SVST estimator (see also [Sj). The second is Ander- 
son's celebrated monotonicity property for the integral of a symmetric unimodal prob- 
ability distribution over a symmetric convex set [18], from 1955, and more specifically 
its implications for monotonicity of the power function of certain tests in multivariate 



hypothesis testing ||19|] 



To simplify the proof, we introduce the following definitions, which will be used in 
this section only. 

Definition 2. A weak notion of matrix majorization based on singular values. Let 

A, B E Mmxn have singular value vectors a, b G M™ respectively, which as usual we 
assume are sorted in non-increasing order: < &„ < . . . < ai and < bm < ■ ■ ■ < bi. li 

ai < bi for i = 1, . . . , m, we write A < B. 

Definition 3. Orthogonally invariant function of a matrix argument. We say that 

/ : Mmxn -^ M is an orthogonally invariant function if f{U ■ A ■ V) = F{A) for all 
A G Mmxn and all orthogonal U G Om and V E On- 

Definition 4. SV-monotone increasing function of a matrix argument. Let/ : Mmxn -^ 
M be orthogonally invariant. If, whenever A < B and a > 0, / satisfies 

E/(A + Z) <E/(5 + Z), (50) 

for Z G Mmxn and Zj ., ~ A/'(0, a^), we say that / is singular-value-monotone increas- 
ing, or SV-monotone increasing. 

We first observe that by rescaling an arbitrary rank-r matrix, it is always possible to 
majorize any fixed matrix of rank at most r (in the sense of Definition |2l). 

Lemma 3. Let C G Mmxn be a matrix of rank r and let X G Mmxn be a matrix of rank 
at most r. Then there exists ;U > for which X < jiC. 

Proof. Let c, x be the singular vectors of C, X respectively, sorted in non-increasing 
order. Then c^ > 0. Take fi = xi/cr. For 1 < z < r we have Xi < xi = fiCr < fxci, and for 
r + 1 < i < m we have /iCj = Xj = 0. D 

To establish that / is SV-monotone increasing, it is enough to show that / is SV- 
monotone increasing with respect to each singular value individually: 
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Lemma 4. Let / : M^xn — >■ M be an orthogonally invariant function such that 

E /(aA + Z) < E / ((a + 5ei)A + Z) 

for any a G M™, 5 > and 1 < i < m, where ei is the canonical basis vector, {ei)j = Sij. 
Then / is SV-monotone increasing. 

Proof. Let A,B G Mmxn with singular value decompositions A = Ua ■ sla ■ V^ and 
B = UB-h^-V^, and assume A < B, namely ai < bi,i = 1, . . . ,m. Since / is orthogonally 

invariant and U'^ ■ Z ■ Va -- Z,we have E/(A + Z) = Ef {U'^- A-Va + U'^- Z -Va) = 
E /(aA + Z) and similarly E f{B + Z) = E f{bA + Z). By assumption, 

Ef{A + Z) = E/((ai,...,ajA + ^) 

< Ef{{bi,a2...,am)A + Z) 

< E/((6i,...,6„„i,a„)A + ^) 

< E/((6i, 62 ... , bm-i, bm)A + Z)=Ef{B + Z). 

n 

Das Gupta, Anderson and Mudholkar il9l. thm. 1] gave a useful extension of the 
seminal monotonicity result of Anderson |18|1 : we present it using our standing nota- 
tion. 

Theorem 10. Let W G Mjnxn be a random matrix whose rows Wj G M" (1 < j < m) 
are mutually independent, each with a vector normal distribution Wj ~ Af^kj-^j , S,), where 
kj > and Wj G M". Assume that E c Mmxn is a convex set, symmetric in each Wj given the 
other Wi (i ^ j), in the sense that W e EiffS-W e Efor any S G Mmxm. with Sa G {±1}'". 
Then ¥{W G E) is monotone non-increasing in each, kj as long as kj > 0. 

Theorem [10] asserts monotonicity of expectation of indicators. Using the co-area 
formula for expectation of non-negative random variables, we obtain monotonicity of 
expectation of quasi-concave (and hence quasi-convex) functions: 

Lemma 5. Bounded invariant quasi-convex functions are SV-monotone increasing. 

Let / : Mmxn — )• M be a bounded orthogonally invariant function. Assume that / is 
quasi-convex function, in the sense that for all c G M the set /^^((— 00, c]) is convex in 
Mmxn- Then / is SV-monotone increasing. 

Proof. Let a G M'", 6 > and 1 < i < m. Define b = a + 5ei. By Lemma IH, it is enough 
to show that 

E/(aA + Z)< E/(bA + Z) . (51) 

Let g = —f so that g is quasi-concave, orthogonally invariant and bounded. Without 
loss of generality we may assume that ini g = 0. By the co-area formula we have for 
the expectation oi g{aA + Z): 



00 
E^(aA + Z) = /" P {^(aA + Z)>c} do 



and it is therefore enough to show that for all c > 0, 

P {^(aA + Z)>c}>F {g{hA + Z) > c} . (52) 
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We invoke Anderson monotonicity. Theorem [TOl with the value of i chosen above, 
ki = at, Wj the standard basis vector {wi)i = 6i^e and Sj = /„, and with E = g^^{[c, oo)) 
above. Since g is quasi-concave, E is a convex set in M^nxn- Since / (and hence g) is 
orthogonally invariant g{aA + Z) = g(S-{aA + Z)), namely g is invariant under row sign 
changes, the symmetry requirement in the theorem is satisfied. The theorem therefore 
holds, and we obtain 

F{ei^ + Zeg-\[c,oo))}>¥{hA + Zeg~\[c,oo))} 

as required. D 

The sufficient condition for SV-monotonicity we will actually use is: 

Lemma 6. Assume that / : Mmxn -> IR can be decomposed as / = J2k=i fk> where for 
each I < k < s, fk : Mmxn — ;■ IR is SV-monotone increasing. Then / is SV-monotone 
increasing. 

Proof. Let A,Be Mmxn with A ^ B. 'Qy Lemma |5l we have for each k that E fk{A + 
Z) < fk{B + Z). Therefore Ef{A + Z) = Y.l=i^fk{A + Z) < Y.l=i^fk{B + Z) = 
Ef{B + Z). D 

The final key ingredient in the proof of Theorem [T] is the Stein Unbiased Risk Esti- 
mate for SVST. In Appendix |A] we prove: 

Lemma 7. The Stein Unbiased Risk Estimate for SVST. For each A > 0, there exists 
an event S C M^xn and a function, SUREx : 5 — )■ M, given in Eq. (|78)) farther below, 
with the following properties: 

1. P(5) = 1, where P is the distribution of the matrix Z with Zij ~ A/'(0, 1). 

2. SUREx is a a finite sum of bounded, orthogonally invariant, quasi-convex func- 
tions. 

3. Denoting as usual Y = Xq + Zj^ e Mmxn, where Xq, Z g Mmxn and Zij ~ 



A/'(0, l),wehave 



R{Xx,Xo) = -Ex,SUREx{Y) 
m 



Putting together Lemma [6] and Lemma [3, we come to a crucial property of SVST. 

Lemma 8. The risk of SVST is monotone non-decreasing in the signal singular val- 
ues. For each A > 0, the map X h- R(Xx,X) is a bounded, SV-monotone increasing 
function. In particular, let A,Be Mmxn with A ^ B. Then 

R{Xx,A)<R{Xx,B). (53) 

Proof. By Lemma [71 the function SUREx : Mmxn — ^ IR satisfies the conditions of 
Lemma [6] and is therefore SV-monotone increasing. It follows that 

R{Xx, A) = ^SUREx{A + Z/^) < ^SUREx{B + Z/^) = R{Xx, B) . 
To see that the risk is bounded, note that for any X G Mmxn we have by Lemma[Zl 



oo< inf SUREx{Y)<R{Xx,X)< sup SUREx{Y) <oo. 



D 
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23 



Proof of Theoremlll By Lemma|8l the map /i — )• R{Xx, /iC) is bounded and monotone 
non-decreasing in //, hence lim^^oo R{Xx, ^C) exists and is finite, and 



i?(^A,/ioC)< limi?(XA,/iC) 



/i— >oo 



(54) 



for all /io > 0. Since rank{C) = r, obviously 



sup R{Xx, Xo) > lim R{Xx , jjC) , 

rank(Xo)<r t^^°° 

and we only need to show the reverse inequality Let Xq e Mmxn be an arbitrary matrix 
of rank at most r. 

By Lemma |3] there exists /xq such that Xq ^ f^oC It now follows from Lemma |8] and 
dSUthat 

R{Xx,Xo) < R{Xx,fioC) < lim R{Xx , /iC) . 

li—>-oo 

n 



5 Worst-Case MSE 

Let A and r < m < n, and consider them fixed for the remainder of this section. Our 
second main result. Theorem |2l follows immediately from Theorem [H combined with 
the following lemma. 

Lemma 9. Let Xq g Mmxn be of rank r. Then 



lim R{Xx , /iXo) = M„ I , ^ ^ 
M^°o \ ^yl - r/n 



; r, m, a 



as defined in ^H} . with a = 1 for case Mat and a = 1/2 for case S'ym. 

Proof. Assume for simplicity that the data singular vector y is sorted in non-increasing 
order: yi > ■ ■ ■ > Vm- Define 

i; = /iXo + ^z = f/^-(y^)A-v-; 

and write xa for the entry-wise soft thresholding (x— A)+, sothatXA(F) = ?7y(yA)A"^Y 
is the SVST denoiser. By invariance of the Frobenius norm to orthogonal transforma- 
tions, we have 



XA(r^) - Xo 



\Xo - U, ■ (y^,A)A ■ l^^ll' =\\U'-Xo-V,- (y^,A)A||' 



(f/;-Xo-y,)^-(y,,A)A 



A«l IF MM 
2 



IX, 



oWf 



K-^o-v,)^ 



Introducing the "pinching" [20(1 notation 
so that y^ = x^ + z^, and using the fact that 



IX, 



o\\f 



(K-^o-v,)^ 



' =-\\z\f 

F n ^ 



n 



K-z-v,)^ 
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we get 



Therefore, 



Xx{Y^)-Xo = ||yp,,A -x^,||2 + i ||Z||J,- ||z^||2 . (55) 



lim R{Xx,f^Xo) = lim (-E||y^,A-x^||^ + l--E||zJ|^) (56) 



1 '' 

4 = 1 

-. m 

+ - V lim E((y^,, - A)h. - x^,,)2 (58) 

«=r+l 
1 *" 

- - V lim E(2;^,,)2 

i=l 
^ m 

V lim E((z,,)2 (59) 

j=r+l 
+ 1. 

We now proceed to evaluate each of the terms (fST)) , (|58l) and (|59)) in turn. Starting with 
(|57)) , observe that E^^,; = for all i. By Lemma [H pointwise, lim^j_^oo y^,j = oo, so that 
lim^^oo {,{y^i,i - A)+ - x^,i) = lim^^oo (y^,* ~ ^a«,«) • I* follows that for 1 < i < r, 

lim E((y^,i - A)+ -x^.i)^ = lim E ((x^.j + z^.i - A)+ - a;/,,,)^ 

= lim E (2;„.i - A) = 

Turning to (|58ll . Let (Ci, • • • , Cm-r) = C(-^05 Z\X.) as in Definition [H By Lemma |2] we 
have lim^^oo Xfj,^i = and lim^^oo Zf,^i = Cr-i/v^, f or r + 1 < i < m, namely, 

lim E {{y^,, - A)+ - x^^if = -E(0-. - v^A)'^ . (60) 

At-i>oo n 

Finally, as for ((59|) , again by Lemma |2] we have 

limE((;2^,,)' = -E(0-.)^ (61) 



Collecting the terms in ((56l) , we find that the sum does not depend on the particular 
choice of the signal matrix Xq, and in fact 

_ m—r ^ m—r 

R{X, , /iXo) = 1 + -A^ + — V E(0 - v^A)^ + — E ^C' • (62) 

Now, 



771 71771 '■ — ' n7Jl 



ra^r 



n-r sr^ { Q A 



y^ E{Q - VnX)l = a- — -J^E. ^ ^ 

7i7n ^ ^ 77171 ^ \ J71 — r ^/l — r/71 

1=1 1=1 \ * V I , 

with a = 1 in case Mat and a. = 1/2 in case Sym. This factor in case Sym follows 
from the fact that in case Sym, with probability 1/2 we have Zi < 0, and conditional 
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on Zi > 0, Zi follows the same distribution as in case Mat. Observe that in the case 
Mat, {Ci/V'iT' ~ ''^}i-i m-r ^^^ ^^^ eigenvalues of a standard central Wishart matrix ~ 

Wm-ril, n — r), so that 



Tin ' ' ' mr) ' ' 



n — r) -^-^ / A 



mn^ ■ ' mn ^ V Vl - r/n 



where Wi was defined in (|12|) . 

Finally, observe that XIHTC i^ the Frobenius norm of 2'(m_r)x(n-r)/ hence equals 
(m — r){n — r) in expectation, to the effect that 



,2 



— E V C' = ^ ^^ = 1 + . (63) 

in ' ^ mn 



mn '■ — ' mn m n ran 

1=1 



Setting A = A/ a/1 — r/n, we collect the terms and recover dTT) as required. D 



Lemma 10. The function A i— ;■ M„(A; r, m, a), defined in (|TT|) on A G [0, oo), is convex 
and obtains a unique minimum. 



Proof. Differentiating (|T2)) under the integral w.r.t A, since the upper integration limit 



does not depend on A, we get 



oo 



— Wi(A;m,n) = — j {Vt - kf dWi{m,n){t) 



A2 

oo 



- (Va5 - a)' f5ii^(A^) . (2A) + I J.(y? _ A)^ ,».(„, „) 



A2 



= -2 {Vi-A)dWi{m,n). 

A2 

Differentiating w.r.t A again, the boundary terms vanish again and we get 



d^ 



oo oo 

,.. (A; ™. „) . -2 / A,yi _ A) ,;^.(„. „) . 2 J ,^,.(,„. „) . 



dA' 

A2 A2 

Therefore, by (|TT]t we have 

d ^ ^ , , X c? rfn — r) . r, (n — r) ^^ d , ^ 

aA dA mn mn -^^^ dA 

i=l 

oo 

= 2 ''^''~''^ A-2 [(Vi-A)dWi(m,n). 
mn J 

A2 

and 

M„(A;r,m,a) = — -^ ^A^ + a^ ^ \ y;. (A; ^ _ ^; ^ _ ^) 



dA^ dA"^ mn mn ^-^ dA 

i=\ 



oo 

rin — r) in — r) sr^ f , , 

2^ '- + 2a- > dW^(m-r,n-r)>0. 

mn mn ^-^ I 

A2 
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Therefore, A i— )■ M„(A; r,m, a) is convex on [0, oo) with 

— — M„(0;r, m, a) < and 
dA 

lim — — M„(A; r, m, a) > 
A^-oo dA 

and the lemma follows. D 

Finally we can prove our second main result. 

Proof of Theorem |2l Let C G Mmxn be an arbitrary fixed matrix of rank r. For case 
Mat, by Theorem [1] and Lemma |9l 



Mn{r,m\Mat) = inf sup R{Xx,Xo) 

rank{Xo)<r 



inf lim R{Xx , /iC) 

A>0 A«— ^-oo 



inf M„ I — . ; r, m, 1 



inf M„(A;r,m, 1) 

A>0 

min M„(A: r, m, 1) , 
A>o "^ ' ' ' ^ ' 



where we have used Lemma [TOl which also asserts that the minimum is unique. 

Now let C G S*" be an arbitrary, fixed symmetric positive semidefinite matrix of 
rank r. For case Sym, by the same lemmas. 



AdnirlSym) = inf sup R{Xx,Xo) 

^ XoSM^xn 
rank{Xo)<r 



= inf lim R{Xx , /iC) 

A fi—^oo 

= infMj^=i==;r,l/2 
^ \ a/1 — r/n 

= infM„(A;r,l/2) 

A 

= minM„(A;r,l/2), 

A 

where we have used Lemma [TOl which also asserts that the minimum is unique. D 

6 Worst-Case AMSE 

Toward the proof of our third main result. Theorem |3l let A be fixed. We first show 
that in the proportional growth framework, where the rank r(n), number of rows 
m(n) and number of columns n all tend to oo proportionally to each other, the key 
quantity in our formulas can be evaluated by complementary incomplete moments of 
a Marcenko-Pastur distribution, instead of a sum of complementary incomplete mo- 
ments of Wishart eigenvalues. 

Lemma 11. Consider sequences n H- r(n) and n ^^ m{n) and numbers < /3 < 1 and 
< p < 1 such that lim„^oo r{n)/n = p and lim„_i.oo m{n)/n = [3. Let {Qi{n), . . . , (rn-r{n)) 
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({Xq, Z\X.), as in Definition [U, where Z e Mmxn has i.i.d A/'(0, 1) entries. Define 7 

(/5 - p/3)/(l - p/3) and 7± = (l ± ^7)^ and let < A < ^77. Then 



^ m— r 

lim — V E 



n-s>oo m 



i=l 



Ci 



y/n — r 



A 



7+ 



(i-p)y(vt-A)- 

A2 



V(7+-t)(t-7_) 
27rt7 



(it . 



Proof. Write ^j = C^/(n — r) and recall that by the Marcenko-Pastur law |14|1 . 

lim y 6f. = P.y , 



n-s>oo m — r 



i=l 



in the sense of weak convergence of probability measures, where P^ is the Marcenko- 
Pastur probability distribution with density p^ = dP^/dt given by (|T5|) . Now, 



lim — V f v^ - a) = lim — V / (yft-X) 6,. (t)dt 

n-s>oo m ^-^ \ / + n-5.oo ?Ti ■^^— ' / V / + 

i = l j = l r, 





00 

/2 1 '""'' 

fVt-A) ^ b..(t)dt 

^=1 

7+ 



;i-p) / (Vt-A)%^(t)dt 



n 



as required. 

Lemma 12. Let m(n) and r(n) such that lim„_^oo rnln = /S and lim„_^oo r/m = p, and set 
p = Pp. Then 

lim sup P(Xa,Xo) = M ( ^=^;p,p,a ) , 

ranA;(Xo)<'" 

where the right hand side is defined in ((T7|) , with a = 1 for case Mat and a = 1/2 for 
case Sym. 

Proof. For case Mat, let C(n) G Mmxn be an arbitrary fixed matrix of rank r. For case 
Sym, C{n) G S"" an arbitrary, fixed symmetric positive semidefinite matrix of rank r. 
By Theorem [T] and Lemma |9l, 



lim sup R{Xx,Xq) 

rank(Xo)<.r 



lim lim R{Xx, pC{n)) 



lim M. 



A 



00 \^ — t'/'n 



■,r,m,a 



lim 

n— ^00 



r r r r , 9 

— + + — A^ 

m n mn m 



+ a- 



n — r 



mn 



E^ 







A 



2 n 



j=l 



\/n-r y/l -r/ 



n 



P + P-pp+(l-p)pA2 



7+ 
+ a{\- p){\-p) U^t-kfMP^{t)dt 



M 



A 



yr^ 



A2 



= ,P,P,a 
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where we have used Lemma [TT] and set A = \/ \/l — p. 

We now prove a variation of Lemma [10] for the asymptotic setting. 

Lemma 13. The function A i-)- M(A; p, p, a), defined in (|T7|) on A G [0, 7+], where 7^ 
( 1 + a/(p — pp)/{p — pp) ) , is convex and obtains a unique minimum. 



D 



Proof. Note that (|T7|) is conveniently expressed as 



M(A;p,p,a) = p + p+(l-p) 



7+ 



pA^ + a(l - p) hVi- ^fpyit) dt 

A2 



(64) 



Differentiating the righmost term of (|64|) under the integral, we get 

(v^-a)%,(A2).(2A) 



7+ 



^f{Vt-Arp,{t)dt = iv^ - A? p,{i+) ■ ^ 



A2 



7+ 



9A 



{Vi - Af p^{t) dt ^ 



A2 



Since ^7(7+) = 0, both boundary terms vanish and therefore 

7+ 7+ 

— I {yft - Af p^it) dt = -2 {Vi - A) p^{t) dt = -2P^{A'^; D + 2AP^{A^;0) 

A2 A2 



where Pj was defined in ([16]). Differentiating w.r.t A again, the boundary terms vanish 
again and we get 



_d^ 
dA- 

By ([64]) we obtain 

d 



7+ 7+ 

/"(Vt - Afp^{t) dt = 2 I p^{t) dt = 2P^(A2; 0) 



A2 



A2 



dA 



M(A; p, p, a) = 2(1 - p)pA + 2a(l - p)(l - p) {AP.iA'; 0) - P,iA'; §)) . (65) 



and 



(iA2 



M(A; p, p, a) = 2(1 - p)p + 2a(l - p)(l - p)P^iA^; 0) > . 



Therefore, A \-^ M(A; p, p, a) is convex on [0, 7_|_] with 

— M(0;p, p, a) < and 
d 



dA 



M(7+;p,p,a) > 



and the lemma follows. 

This allows us to the prove our third main result. 



D 
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Proof of Theorem |3l By Lemma [121 



A^(p,/3|X) = lim inf sup R{Xx,Xo) 

rank{Xo)<r 

= inf lim sup R{X\,Xq) 

A n— >oo V ^AT 

Xo6M„xn 
ranfc(Xo)<?' 

= infM ( —=^;p,p, a 
= inf M (A; p,p, a) 

A 

= minM (A; p,p, a) , 

with a = 1 for case Mat and a = 1/2 for case S'ym, where we have used Lemma 
which also asserts that the minimum is unique. 

7 Minimax AMSE 



Having established that the asymptotic worst-case MSB ((TT)) satisfies ([18]) and ([19]), we 
turn to its minimizer A*. The notation follows (|20l| . 



Proof of TheormHl By (|65l) above, the condition 

dM{A; p, p, a) 



dk 
is thus equivalent, for any p G [0, 1], to 

7+ 







/(A, p) := pA - a(l - p) j{^^t - A) p^(t) t/t = , (66) 



A2 



establishing (|23l) in particular for < p < L By Lemma [131 the minimum exists and is 
unique, namely this equation has a unique root in A. One directly verifies that /(I + 
^, 0) = /(0, 1) = 0. The limits ^ and (O follow from the fact that p ^ A,(p, ■) is 



decreasing. To establish this, it is enough to observe that df /dp > for all (A, p), which 
can be verified directly. D 

We proceed to examine the special case /3 = 1. 

Proof of Theorem[5l When /3 = 1, 7+ = 4 and 7_ = in dTS). Changing the integration 
variable by 1 1— t- t^ in ([151) we get 



Pi{x;k) = - I t'^^/I^^dt. 



(67) 



namely the /c-th incomplete moment of the Quarter Circle law. Substituting this and 
p = p = 1 into ([T7|) we recover ([24|) . The identities ([ZS) , ([IS) and ([27| may be directly 
verified by differentiation. 

To show that A* (p, p, a) satisfies ([291) , observe that the condition ([661) , which is equiv- 
alent to the general minimizer characterization ([231) , i^iay in the case /3 = 1 be written 
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in the equivalent form 



Kp 



- lit - K)y/A^¥dt = ^^^ . (68) 

TX J 1-p 

A 



Clearly any solution A* satisfies < A* < 2 and we define 

Oaip) = arcsin(A,(p,p,a)/2) . (69) 

Changing the integration variable t t-)- 2 sirup and substituting A = 2 sin 6* in ((68)) , we 
find that (|68]) is equivalent to 

tt/2 
if p 

— / (sin ip — sin 6) cos^ ip dip = sin 6 , (70) 

TT J 1-p 

e 
which is in turn equivalent to (|29|) by elementary integration. D 

Our result regarding parametric representation of the minimax AMSE curve for (3 = 
1 follows immediately: 

Proof of Theorem [7l The curve is parametrized using the parameter 9 above. In each 
of the cases Mat and Sym, the formula is obtained by solving (|29]) for p to obtain p{9), 
and simplifying M{A{e); p{e),p (6), a), where A(6') = 2sin6' and M is defined in (|I4)) . 
We omit the elementary algebra. D 

Toward the proof of Theorem [6l we recall a trivial fact about convergence of mini- 
mizers. 

Lemma 14. Let /„ : [a, b] — )■ M be a sequence of continuous functions on [a, 6] C M 
and assume that {/„} converges pointwise to / : [a, b] — )■ M. If x„ G [a, b] is the 
unique minimizer of /„, (n = 1,2,...), and x E [a,b] is the unique minimizer of /, 
then lim.„^oo Xn = x. 

Proof. Let {xn^} be a convergent subsequence of {x„}, and write linifc^oo Xn^^ = y. It is 
enough to show that y = x. Since /„ is continuous on a compact interval, it is uniformly 
continuous, hence fnkixnj —^ fin)- Since Xn^ is a minimizer of fn^^, for all k we have 
fnS^nt^) < frikix). In the limit k ^ oo this inequality yields f{y) < f{x). Since x is a 
minimizer of /, f{x) < f{y). Therefore f{x) = f{y). It follows that y is a minimizer of 
/, which is unique by assumption, so that x = y. D 

Proof of Theorem [6l Define 

A"(r, m, a) = argmin^ M„(A; r, m, a) , (71) 

and recall the definition of A*(p, (3, a) in (|20l) . By Theorem [H Lemma |9] and Lemma 
[T2I the function sequence A t-^ M„(A; r, m, a) converge pointwise to the function A t-^ 
M(A, p, p, a) on A G [0, 7+]. We invoke Lemma [T4l to obtain that the minimizers of the 
former converge to the minimizer of the latter, namely 

lim A^(r,m,a) = A,(p,/3,a). (72) 
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Observe that 



A*(m,n, r|X) = argnrin^ sup — — jcx,, 



-E, 



rank{Xo)<r 



ran 



Xx (Xo + Z)-Xo 



y/n ■ argmin^ sup E^o -^a (-^o + Z/^) - X, 

rank{Xo)<:r 



n ■ argmin^M^ 



A 



a/1 — r/n 



y/n ■ a/1 — r/n ■ argmin^M^i (A ; r, m, a) 
n ■ y 1 — r/n ■ ^^{r, m, a) . 



with a = 1 for case Mat and a = 1/2 for case Sym. Since lim„_^oo a/1 — r/n = vT^-^/ 
we thus have 

]^ 
lim ^^A*(m, ?7,,r|X) = lim a/1 — r/n ■ A"(r, m, a) = a/1 — /3p ■ A^{p, /3, a) , 



where we have used dZZ 



D 



8 Minimax AMSE in the Low Rank Limit p ^ 

We proceed to evaluate the limit Muip^o M.{p, /3\X.) / p. 
Proof of Theorem m We first show that 

7+(p) 

lim- [ {Vi - A,{p)Y p^ip){t)dt = . 
p^o p J 

Observe that by (|23l) , which is equivalent to (|66|) , the minimizer A^, satisfies 



7+(p) 



/ [Vt-A^{p)]p^(p){t)dt = — 



pK{p) 



A^{p) 



P) 



SO that 



7+(p) 



lim / [Vi - K{p)\p^<p){t)dt = {) . 



aKp) 



Differentiating (|73]) under the integral sign, it remains to show that 



7+{p) 



A2(p) 



(73) 



for /c = 0, 1, 2. Note that limp^o A*(p) = limp^o7+(p) = (1 + V^)^- However, it is easy 
to verify that dp^i^p)/dp < C ■ (7+(p) — t)~^/^ in a neighborhood of (1 + i/5)^/ ^or an 
appropriate constant C, so that 



7+(p) 

lim / t^/^^P,(p){t)dt < limCv/7+(p) - t 

A2(p) 



7+(p) 
A2{p) 



0. 
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We now proceed to calculate the required limit. In the case Mat, by (|2T|) we have 

limA*(p,/3,a) = 1 + ^/^. 



Therefore 



limiX(p,/3 1 Mat) = lim iM(A,(p, /3, 1) , P , P, 1) 
p-^o f' p-^o f^ 

= limMp + P + pp + p(l-p)A2(p,/3,l)) 

p-s>0 f' 



lim (l + /3 + p/3 + (1 - p/3)(l + ^)2) 
1 + /3 + (1 + v^)2 = 2 (l + v^ + /3 



where we have used ([731) ^rid the relation p = (3p. The calculation for case Sym is 
identical. D 



9 Global Minimax MSE and AMSE 

We first give sense to the notion of a general singular-value-based denoiser. This is to 
be a mapping Y t-^ X{Y) that acts on Y only through its singular values, i.e. a mapping 
of the form X : Mmxn -^ Mmxn of the form 

X{Y) = UY-Hy)A-V^, (74) 

where Y = Uy ■ Ya ■ Vy ^^^ ^ • [0' oo)'" — ;> [0, oo)™. The mapping in (|74l) is not well- 
defined in general, since the SVD of Y, and in particular the order of the singular values 
in the vector y, is not uniquely determined. Well-definedness of (|74l) will obtain when 
each function Xi : [0, oo) — )■ [0, oo) is invariant under permutations of its coordinates. 
Since the equality Y = Uy -yA-Vy may hold for vectors y with negative entries, we are 
led to the following definition. 

Definitions. By SmguZar Vfl/ueDenozser we mean any measurable mapping X : Mmxn —' 
Mmxn which takes the form (|74l) , where each entry of x is a function xi : M™ — ;■ M that is 
invariant under permutation and sign changes of its coordinates. We let V denote the 
class of such mappings. 



With this definition, Y H- X(Y) in (|74|) is well defined. For a detailed introduction 



to real-valued or matrix-valued functions which depend on a matrix variable only 



through its singular values, see ||21l.l22|1 . 



Proof of Theorem [9l Let 0„ denote the orthogonal group in Mnxn and let Om x 0„ 

act on Mmxn by iU,V) : X i-)- U' ■ X ■ V. Recall that a decision rule X satisfying 
UX{Y)V' = X{UYV') for all {U,V) e G is called equivariant with respect this this 
group action (see |23l. def. 2.5]) In yj, cor. 7] it is shown that V coincides with the family 



of equivariant decision rules (see also ||2ll, prop. 5.1]). Now, the Hunt-Stein Theorem 



[|23L |24|] implies that a lower bound on the minimax MSE over Dis also a lower bound 



on the global minimax MSE. Let X{Y) G V. We will show that 

sup R{X,Xo)> 



ly ^ I ly 



rank{Xo)<r 



10 Discussion 33 



Indeed, let Xq g Mmxn be a fixed arbitrary matrix of rank r. The calculation leading 
to (|55l) , is valid for any rule in V, and implies that R{X{Y),Xo) > 1 — — E | |z| I2, where 

Y = UY-yA-V{. and 

z = J^{U^-Z-V)a. (75) 

Write 1; = 12X0 + Z/^ = U^ ■ (y^)A ■ V;^ and let z^ = ^(f/; ■ Z • y^)A. We therefore 
have 



1 



sup R{X,Xo) > lim R{X,ijXo) > 1 lim E||^ 



XqSX. 



m.,n 



/x— >oo 77i /x— ^-oo 



|2 
MII2 • 



ranfc(Xo)<r 

Combining (|63|) and (fST]) , we have seen already seen that 

1 ™ 2 

1 \-^ , ^, ,9 r r r 

- V lim E 2^,i 2 ^1 ^ ^ 

m ^-^ ii-^oo m n mn 

A similar argument yields 

1 *" 
- V lim E(2;^,,)2 = — , 

and the first part of the theorem follows. The second part of the theorem follows since, 
taking the limit n — )■ 00 as prescribed, we have r/m — )■ p, r/n — )■ p and r/mn — )■ 0. For 
the third part of the theorem, we have by Theorem |8] 

^.^ ■M(p,/3|X) _^.^ -M(P,/3|X) _ 2(l + v/^ + /3) _ /^ , ^ 



P^o M-{p, (3) P^op + (3p + (3p^ 1 + /3 V 1 + /3 



D 



10 Discussion 

10.1 Similarities Emerging from our Proofs. 

In the introduction, we pointed out several ways that these matrix denoistng results for 
SVST estimation of low-rank matrices parallel results for soft thresholding of sparse 
vectors. Our derivation of the minimax MSE formulas exposed two more parallels. 

• Common Structure of minimax MSE formulas. The minimax MSE formula vector 
denoistng problem involves certain incomplete moments of the standard Gaus- 
sian distribution. The matrix denoising problem involves completely analogous 
incomplete moments, only replacing the Gaussian by the Marcenko-Pastur dis- 
tribution or (in the square case (3 = 1) the quarter-circle law. 

• Monotonicity of SURE. In both settings, the least favorable estimand places the 
signal "at 00", which yields a convenient formula for Minimax MSE. In each set- 
ting, validation of the least-favorable estimation flows from monotonicity, in an 
appropriate sense, of Stein's Unbiased Risk Estimate within that specific setting. 
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10.2 Another Connection: Block Thresholding 

The parallels between the problems do not seem accidental. An interesting denois- 
tng problem intermediate between sparse vector and low-rank matrices involves block- 
sparse vectors izt]. Expressing that problem in this paper's notation, the object Xq is 
an m-hy-n array, and we call the columns "blocks"; so each block has m entries. We 
assume at most a fraction e of blocks are nonzero: e ■ n > #{j : ||X(-,j)||2 ^ 0}. We 
observe noisy matrix data Y = Xq + Z and we consider the block-shrinkage problem: 

(P2,i) X, = argmin^^,,^_i ||F - X\\l + A J^ ||X(-, j)||2. (76) 

j 

By the inequalities 

||X||,<^||X(-,j)||2<$^|X(z,j)|, (77) 

(NNP) is a relaxation of {P2,i), while (P2,i) is a relaxation of (Pi) applied to vec{Y). 

In the case m = 1 we recover (Pi) and the soft thresholding procedure. In the 
case m > 1 we obtain block thresholding; it promotes reconstructions Xx where many 
blocks are fully zero and a small fraction are nonzero; the nonzero blocks are those 
where ||X(-,j)||2 > A. 

The chain of inequalities ((77|) places (P2,i) intermediate between (Pi) and (NNP). All 
the parallels mentioned so far between soft thresholding and SVST also hold between 
block soft thresholding and the other two methods. 

• All three involve soft thresholding of relevant objects - scalars, column norms, or 
singular values. 

• All three have a least favorable estimand with its nonzero piece "at oo". 

• All three have a minimax penalty factor A* (e) monotone decreasing in e. 

Moreover, the two parallels of the last section carry over as well. For block thresh- 
olding, the minimax MSE involves incomplete moments, this time of the classical Xm 
distribution. The monotonicity of SURE carries through and implies the structure of 
the least-favorable estimand. 

10.3 Block Thresholding and the Minimaxity Gap 

We bring up block soft thresholding because of its relevance to the minimaxity gap 
of SVST that we conjectured in the introduction. Donoho, Johnstone and Montanari 
\0\ considered the following limiting case, where we consider n — )■ oo first, and later 
m ^ oo. In that setting, the minimax MSE among all measurable procedures under 
e-block sparsity can be evaluated; 

Muocki^)= lim lim inf sup ^E\\X - Xo\\l; 



>oo 



^ #{J--Xo{-d)y^O} 



it obeys M{e) = e, and they show it can be attained asymptotically by a particularly 
lovely method: simply apply the James-Stein shrinkage estimator blockwise! On the 
other hand, the minimax MSE for soft block thresholding can be evaluated: 

Muock{e)= lim lim inf sup ^^^^^HXa - ^oIIf- 



References 35 



They obtain Muocki^) = 2e — e^. Consequently, block soft thresholding is never worse 
than a factor of 2 from minimaxity, at any level of block sparsity This bound is achieved 

as e -> 0: 

e^o Muockie) 

namely, in the high-dimensional limit, under extreme sparsity, block soft thresholding 
is a factor 2 worse than minimax. These completed results about the minimaxity gap 
in high-dimensional block soft thresholding are suggestive from the viewpoint of Sin- 
gular Value Soft Thresholding. Could there be an estimator improving on SVST, and 
particularly lovely in form? 
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A Convexity of the Stein Unbiased Risk Estimate for SVST 

In this section we prove Lemma O whereby the Stein Unbiased Risk Estimate (SURE) 
for SVST is given as a finite sum of bounded, orthogonally invariant and quasi-convex 
functions. 



This appendix employs the following notation. Following |2lL |22|] . we say that a 
function / : M" — )■ M is absolutely symmetric if it is invariant under permutations and 
sign changes of its coordinates. As discussed in Section|9l the function of a matrix argu- 
ment F : Mmxn — ^ IR defined by F{X) = /(x), where X = Ux -^a-Vx for some orthog- 
onal Ux G Om and Vx G 0„, is well defined only if / is absolutely symmetric. Denote 
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by Wm = {yi > ■ ■ ■ > Vrn > 0} C M™ the set of singular value vectors (in non-increasing 
order) of matrices in Mmxn and by W^ = {yi > ■ ■ ■ > ym > 0} C M™ the space of non- 
degenerate singular value vectors. We denote by E : M^xn — ^ W^m the map F H- y 
that maps a matrix Y to its singular value vector y, sorted in non-increasing order. E 
is orthogonally invariant, namely invariant under the transformation X i-^ U ■ X ■ V 
for any orthogonal U E Om and V E On- Also note that HY = X + aZ E Mmxn with 

Z,, ~ Af{0, 1), then the event S(F) E W+ holds almost surely 



^^,J 



Let X be a weakly differentiable estimator of Xq from data Y = Xn + aZ, where Z 
has i.i.d standard normal entries. The Stein Unbiased Risk Estimate |25l] is a function 

of the data, Y ^ SURE{Y), for which E SURE{Y) 

and Y are matrices in Mmxn, and Stein's theorem 112 



E 



X-X. 



thm. 1] imp 



. In our case, Xq, Z 

2 

ies that for 



SURE{Y) = mna'^ + X(Y) - Y 



■^"'H 



d(X(Y) - Y) 



*J 



BY,, 



(78) 



«j 



*j 



we have 



X-X, 



Ex,SURE{Y) 



The following lemma specializes the general SURE formula (|78|) to the case of ma- 
trix estimators that act through the data singular values. This calculation has been 
carried out independently by |3|]. While here we use the SURE formula for a theoreti- 
cal purpose, tS] proposed to use it to select a threshold for matrix denoistng by SVST 
in applications. 

Lemma 15. Write the singular value decomposition of F G Mmxn as Y = Uy ■ Ya ■ Vy. 
Let X be any weakly differentiable element of V, as in Definition |5l Consider X as an 
estimator of Xq from Y = Xq + aZ. Then for any Y E Mmxn such that S(F) E W^, we 



have SUREiY) = sure(S(F)), where sure : W;l 



-)■ 



is given by 



sure{y) 



mna + 



2a' 






9{y)\\" + 
9{y)jyj 



(79) 



9{y)iyi 



yi - yt 






+ [n 



m] 



sr^gjy) 



Here, g{y) = x — y and d/dyi is a weak derivative. 

Note that y t-)- surexdy]) is absolutely symmetric. Below, we extend the domain of 
sure\ by symmetry and consider surex : M™ — )■ M™. 

Let us first show that Lemma[Zl whereby SURE\ is a bounded, orthogonally invari- 
ant and quasi-convex function of a matrix argument, follows from Lemma [151 To es- 
tablish quasi-convexity if SUREx we will need to relate the quasi-convexity of SUREx, 
a function of a matrix argument, to quasi-convexity of surex, a function of singular val- 
ues. 

Lemma 16. Let / : M" — ?> R be an absolutely symmetric and lower-semiconttnuous 
function. If / is a quasi-convex then the function F : Mmxn -^ ^ defined by F(X) = 
f{Y^{X)) is quasi-convex on Mmxn- 

Proof. We use the following characterization of quasi-convexity by subgradients due 
to Aussel |26l, thm. 2.2]: A lower semiconttnuous function ^f on a Banach space X is 
quasi-convex if and only if the following condition holds for all x, y G A": 



3x*Ga/(x): (x*,y-x)>0^Vy*G9/(y): (y*,y-x)>0. 



(80) 
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Now, Lewis |22l. cor. 2.5] has provided the following characterization of subgradi- 
ents of F. Let X,F e M^,,n- Then Y e dF{X) if and only if S(F) e 9/(S(X)) and 
there exist orthogonal matrices U G Om and V^ G 0„ and vectors x, y G M™ such that 

X = U -^^-V and 
F = U-y^-V. 

We also recall von Neumann's inequality for singular values |22l. thm 2.1], whereby 
if X, F G Mmxn have singular value vectors x, y G M™, then (X, Y) = (x, y). Here and 
below, {X, Y) = ^ . XijYij is the Euclidean inner product on M„ 



We finally turn to the proof. By |27l, thm. 4.2], F is lower-semicontinuous. Let us 
show that F is quasi-convex using the characterization (|80l) . Let X, F G M^xn and 
assume that X* G dF{X) exists such that (X* , F - X) > 0. We now show that for all 
F* G 9F(F), we have (F*, F - X) > 0. Let Y* G 9F(F), and let x, y, x* and y* denote 
the singular value vectors of X, F, X* and Y* respectively. Since X* G dF{X) we have 
(X*,X) = (x*,x). Also, by von-Neumann's inequality we have (X*,F) < (x*,y). We 
therefore have 

(x*,x) = (X*,X)<(X*,F)<(x*,y). 

Since / is quasi-convex, by assumption, in particular for y* we have (y*, x) < (y*, y), 
since y* G dF{Y). Also, (F*,F) = (y*,y). Again by von-Neumann's inequality we 
have (F*,X) < (y*,x). Together, this gives 

(F*,X)<(y*,x)<(y*,y) = (F*,F), 

as required. D 

Corollary 1. Suppose that / : W^ — )■ M is representable as /(y) = YlT=i fiiVi) where 
/i : M — )■ M is a bounded, nondecreastng, lower-semicontinuous function such that 
fi{-y) = fi{y). Then F : M^xn ^ K defined by F(X) = /(S(X)) is bounded and 
quasi-convex. 

Corollary 2. Suppose that / : M'" — > M is representable as /(y) = J2i<i^j<m hiVi^ Vj), 
where /2 : M^ — ?■ M is a quasi-convex, lower-semicontinuous function obeying /2(x, y) = 
f2{±y, ±x). Then F : M^xn ^ K defined by F(X) = /(S(X)) is quasi-convex. 

Proof of Lemma [7l Write SURE\ for the SURE corresponding to the SVST estima- 
tor Xa- By Lemma [TSl let sure\ be the function of the singular values SURE\{Y) = 
surex{T.{Y)). Substituting x{y)i = [yi — A)+ and a = \l \fn in (|79ll , we get g{y)i = 
— min {yi, A}, hence 



m r 



surex{y) = m+y ^ 



. . ...2 1 (n - m) ■ min {|/i, A} 
[mm{2/i,A}) -%<A} 



2 ^r;^ mill {yj, A} |/j - min {y^, A} yj 



Z/i 



(81) 



Emm lyj, aj (/j - mm |(/i, /\j- yj 
„2 _ „2 • (^2) 



n -^-^ wj — y. 



Each of the terms min {y^, A}, — ly,<A and — min {y^, A} /t/j is non-decreasing, bounded, 
lower-semicontinuous, and therefore, by Corollary [Tl the function implicitly defined 
by the RHS of (|8T|) is bounded and quasi-convex as required. We now turn to the 
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function implicitly defined by the sum in (|82|) . Each element in the sum is given by 

(2/n) ■ fxivj, Hi) for some I < i 7^ j < m, where fx : [0, 00)^ -> M is given by 



-1 < x,y < X 
iin|x, Ajx I 

y^ — x'^ 



f ( \ min{y,A}y-min{x,A}x ■ ^^_y. ^ / ^ \ ^ \ (^->.\ 

fx{x,y) = — — — =S^^ 0<y<X,x>X (83) 



^ x> X,y> X 

x+y — ^ '^ — 

with /a(x, y) = fxiy, x) and fxix, x) defined to yield a continuous function. The func- 
tion fx is shown in Figure [6l We now show that fx can be written as a sum of four 
bounded, lower-semiconttnuous, quasi-convex functions. Decompose fx as 

fx{x,y) = -lio,x]xio,x]{x,y) 

+ ~3. 17 ■ l(A,oo)x(o,A)(a;,y) 

l(0,A)x(A,oo)(a;,y) 
-'-(A,oo)x(A,oo) • 



y'- 


X2 


x^- 


Ay 


y'- 


X2 


X 


1 



X + y 



The image of each of these terms lies in [—1,0], hence each is bounded. To see that 
the second (and hence the third) and the fourth terms are quasi-convex, note that for 

< c < 1, 



(x,y) G [0, cx)) 



^2; — y 

y2_^2 •l(A,c^)x(O.A)(a^,y)< -C 



I (x, y) e (A, 00) X (0, A) (1 - c)y^ < -cx^ + Xx\ 



and 



{x,y) E [0,00) 



l(A,oo)x(A,oo)(a;,y) < -c 



x + y 

X 



{x,y) G (A,cx)) X (A,cx)) 



X + y < 

c 



These are easily seen to be convex sets in M?. We conclude that the function fx can 
be decomposed fx = J2t=i fx > where each fl^' is quasi-convex. It follows that the 
sum (|82|) can be decomposed into four terms Ti + ■ ■ ■ + T4, each of which is a sum of 

quasi-convex functions of pairs of singular values, Ts = X]i=^i /a (?/*> %)• ^Y Corollary 
|2l each term Ts, for 1 < s < 4, is a bounded, quasi-convex function on matrix space: Tg : 
Mmxn ^ ^- It follows that SUREx is a sum of five bounded, quasi-convex functions 
on matrix space, and the proof is complete. D 



Proof of Lemma [151 We first calculate the Jacobian of the SVD, following [281]. For 
the reader's convenience, for the remainder of this appendix, consider the case m > n, 
in order to conform to their notation. We avoid differential geometry notation and re- 
strict ourselves to the more cumbersome, but more widely used, multivariate calculus 
notation. 

Let Y = Uy ■ Ya ■ Vy denote a full ("fat") SVD of F G Mmxn, where now m > n 
and ya G Mmxn- We can view Y t-^ Uy as a map Mmxn -+ Mmxm, where the first n 
columns of Uy are determined (up to sign of each column) by Y and the last m — n 



A Convexity of the Stein Unbiased Risk Estimate for SVST 41 





Figure 6: Part of the graph of the function fx of (|83lt . Left panel: rotated surface plot. 
Right panel: some level sets. 



columns constitute an arbitrary completion to an orthonormal basis of M™. Similarly 
Y h^ y is a function into the Weyl chamber {y\,yi>--->yn}c. M" and Y t-^ Vy is a 
map Mmxn —^ Mnxn, determined up to sign of each column. As we will see later, the 
trace of the Jacobian of the SVD is well defined and does not depend on these arbitrary 
choices. 



1280 have proposed to calculate the Jacobian of each of these multivariate real func- 
tions as follows. They show that 



dU^ 



kl 



dVk,i 
dvk 



{u ■ n''' 



-{V ■ n''' 



V lk,i 



UihVjk 



(84) 
(85) 
(86) 



for 1 < i,k < m and 1 < ?', i < n. Here, VCjj E Mmxn and Qy E Mnxn for any 
I < i < m and I < j < n. |28l] show that both ily and the upper n x n block of it^ 



are antisymmetric, and that each pair ((fi^^)^^ , {iTy)k,e) with 1 < k,i < n satisfies the 
2x2 linear system 



ye{^\j)k,e + yk{^v)k,i 
yk{^u)k,e + ye{^v)k,e 



Ui,kVjl 



(87) 



The authors do not explicitly provide the equations that determine the lower {m—n)xn 
block of i^]}^ . Fortunately, their arguments immediately imply that entries in this block 
satisfy 

(88) 



ye (fi*c/)fc,£ 



U,kV, 



i,k Vj/ , 



for l<i<m,l<j,£<n and n + 1 < k < m. 

We can now evaluate the divergence term in (|78l) . 






dY 



A Convexity of the Stein Unbiased Risk Estimate for SVST 42 



Define G(Y) = X - Y and g{y) = x - y. To evaluate ^^^ we note that G(Y) 
Efe Ui^kg{y)kVj,k and hence 

dG(Y),j V- ^ .rr . N T. N 

k "^''^ 

+ yu.k^4^v,u. 



«.j 



Since 



dg{j)k _ sr^ dg{j)k dyi _ ^^ dg{j)k^^ ^. 

1^ f).,. f)v.. - 1^ ~i^h;r^'^'^^^' ' 



^^M- "^ ^Vi 9Yij -^ dye 
we conclude that 

k 

- J2^^,kgiy)k{vn\,')^^,+ (89) 



dY,, 



+ Y.u^,kV,Jj2 

k \ e 



dyi 



-U^^iV,, 



Recall that the Jacobian trace Tr (|p) = ^ j , Qy. is invariant under change of ba- 
sis of the underlying linear space Mmxn- Consider the orthonormal basis of Mmxn given 
by rank-1 matrices, {uj ■ v'jjij where ui, . . . , Um are the columns of U and vi, . . . , v,j are 
the columns of V, respectively To calculate the trace in this basis, we formally replace 
U and V with identity matrices of their respective dimensions. 

The equations that determine iTjj and fly for 1 < i < m and 1 < j < n, namely (|87ll 
and (|88l) , become 

yi{^u)k,e + yk{,^v)k,e = ^i.k^j/ (90) 

for 1 < k,i < n, and 

Vi {^u)k,e = Si^kSj,e, (91) 

for n + 1 < k < m and 1 < i < n. Similarly, in this basis, (|89l) becomes 



m n n 



^ ^ i=i j=i fc=i 



dg{y)_k 



{^u)i,k9iy)kSj,k - kkg{y)k {^v)j^k + kk^j^k I X] —^^Si,e6, 

i=l j=i j=l ^^ 
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Solving (|90l) and (|9T1l for {Vl^)iJ and {Qy)ij, where i ^ j, we get 



Vj 



W)., = ;j^ i<.,j<n 



1 



{n]f). . = — n + l <i <m, 1 <j <n 

(^y)., = -^^ l<z,j<n 

"'^ Vj-Vi 

and (l^*f}*)i,i = {S^^)i,i = 0, to the effect that 

Changing back to our original notation of m < n by exchanging the sumbols m and 
n, and using the general SURE formula ([78|), we have proved Lemma [151 CH 



